Automata-guided Context-free parsing for punctuationless languages

نویسنده

  • Serge Rosmorduc
چکیده

We propose a system for analyzing texts written in languages which don't make use of punctuation, with syntactic tagging in mind. The core system is a simple chart parser, but to cope with the complexity and ambiguity problems, we use simpliied nite-state automata, which guide the analysis. An application to Ancient Egyptian texts is introduced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parsing with Pictures

The development of elegant and practical algorithms for parsing context-free languages is one of the major accomplishments of 20 century Computer Science. These algorithms are presented in the literature using string rewriting systems or abstract machines like pushdown automata, but the resulting descriptions are unsatisfactory for several reasons. First, even a basic understanding of parsing a...

متن کامل

Language Approximation With One-Counter Automata

We present a method for approximating context-free languages with one-counter automata. This approximation allows the reconstruction of parse trees of the original grammar. We identify a decidable superset of regular languages whose elements, i.e. languages, are recognized by one-counter automata.

متن کامل

A Note on the Succinctness of Descriptions of Deterministic Languages

The result proved in this paper is that for the elements of some infinite class of deterministic context-free languages the size of deterministic pushdown amomata needed to describe them is not recursively bounded by the size of the smallest unambiguous context-free grammars that generate them. This is a quantitative explanation of the fact that some languages require large descriptions in term...

متن کامل

Balanced Context-Free Grammars, Hedge Grammars and Pushdown Caterpillar Automata

The XML community generally takes trees and hedges as the model for XML document instances and element content. In contrast, Berstel and Boasson have discussed XML documents in the framework of extended context-free grammar, modeling XML documents as Dyck strings and schemas as balanced grammars. How can these two models be brought closer together? We examine the close relatioship between Dyck ...

متن کامل

Restarting Automata with Auxiliary Symbols and Small Lookahead

We present a study on lookahead hierarchies for restarting automata with auxiliary symbols and small lookahead. In particular, we show that there are just two different classes of languages recognised by RRWW automata, through the restriction of lookahead size. We also show that the respective (left-) monotone restarting automaton models characterise the context-free languages and that the resp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007